{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "_execution_state": "idle",
    "_uuid": "051d70d956493feee0c6d64651c6a088724dca2a",
    "papermill": {
     "duration": 0.006668,
     "end_time": "2021-02-04T12:17:27.298263",
     "exception": false,
     "start_time": "2021-02-04T12:17:27.291595",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "This notebook contains an example for teaching."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.005558,
     "end_time": "2021-02-04T12:17:27.309488",
     "exception": false,
     "start_time": "2021-02-04T12:17:27.303930",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# The Effect of Gun Ownership on Gun-Homicide Rates - proceeding"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.005661,
     "end_time": "2021-02-04T12:17:27.320615",
     "exception": false,
     "start_time": "2021-02-04T12:17:27.314954",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "In this lab, we estimate the effect of gun ownership on the homicide rate by a neural network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 1.158988,
     "end_time": "2021-02-04T12:17:28.485200",
     "exception": false,
     "start_time": "2021-02-04T12:17:27.326212",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "library(keras)\n",
    "library(lfe)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.006709,
     "end_time": "2021-02-04T12:17:28.498913",
     "exception": false,
     "start_time": "2021-02-04T12:17:28.492204",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "First, we need to load and preprocess the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 25.09111,
     "end_time": "2021-02-04T12:17:53.595895",
     "exception": false,
     "start_time": "2021-02-04T12:17:28.504785",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "data <- read.csv(\"../input/gun-example/gun_clean.csv\") \n",
    "#################################  Find Variable Names from Dataset ########################\n",
    "\n",
    "varlist <- function (df=NULL,type=c(\"numeric\",\"factor\",\"character\"), pattern=\"\", exclude=NULL) {\n",
    "  vars <- character(0)\n",
    "  if (any(type %in% \"numeric\")) {\n",
    "    vars <- c(vars,names(df)[sapply(df,is.numeric)])\n",
    "  }\n",
    "  if (any(type %in% \"factor\")) {\n",
    "    vars <- c(vars,names(df)[sapply(df,is.factor)])\n",
    "  }  \n",
    "  if (any(type %in% \"character\")) {\n",
    "    vars <- c(vars,names(df)[sapply(df,is.character)])\n",
    "  }  \n",
    "  vars[(!vars %in% exclude) & grepl(vars,pattern=pattern)]\n",
    "}\n",
    "\n",
    "################################# Create Variables ###############################\n",
    "\n",
    "\n",
    "# Dummy Variables for Year and County Fixed Effects\n",
    "fixed  <- grep(\"X_Jfips\", names(data), value=TRUE, fixed=TRUE)\n",
    "year   <- varlist(data, pattern=\"X_Tyear\")\n",
    "\n",
    "# Census Control Variables\n",
    "census     <- NULL\n",
    "census_var <- c(\"^AGE\", \"^BN\", \"^BP\", \"^BZ\", \"^ED\", \"^EL\",\"^HI\", \"^HS\", \"^INC\", \"^LF\", \"^LN\", \"^PI\", \"^PO\", \"^PP\", \"^PV\", \"^SPR\", \"^VS\")\n",
    "\n",
    "for(i in 1:length(census_var)){\n",
    "  \n",
    "  census  <- append(census, varlist(data, pattern=census_var[i]))\n",
    "  \n",
    "}\n",
    "\n",
    "################################ Variables ##################################\n",
    "# Treatment Variable\n",
    "d     <- \"logfssl\"\n",
    "\n",
    "# Outcome Variable\n",
    "y     <- \"logghomr\"\n",
    "\n",
    "# Other Control Variables\n",
    "X1    <- c(\"logrobr\", \"logburg\", \"burg_missing\", \"robrate_missing\")\n",
    "X2    <- c(\"newblack\", \"newfhh\", \"newmove\", \"newdens\", \"newmal\")\n",
    "\n",
    "#################################  Partial out Fixed Effects ########################\n",
    "\n",
    "# New Dataset for Partiled-out Variables\n",
    "rdata    <- as.data.frame(data$CountyCode) \n",
    "colnames(rdata) <- \"CountyCode\"\n",
    "\n",
    "# Variables to be Partialled-out\n",
    "varlist <- c(y, d,X1, X2, census)\n",
    "\n",
    "\n",
    "# Partial out Variables in varlist from year and county fixed effect\n",
    "for(i in 1:length(varlist)){\n",
    "  form <- as.formula(paste(varlist[i], \"~\", paste(paste(year,collapse=\"+\"),  paste(fixed,collapse=\"+\"), sep=\"+\")))\n",
    "  rdata[, varlist[i]] <- lm(form, data)$residuals\n",
    "}\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.006178,
     "end_time": "2021-02-04T12:17:53.609824",
     "exception": false,
     "start_time": "2021-02-04T12:17:53.603646",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# DML for neural nets\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.005978,
     "end_time": "2021-02-04T12:17:53.621881",
     "exception": false,
     "start_time": "2021-02-04T12:17:53.615903",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "The following algorithm comsumes $Y$,$D$ and $Z$, and learns the residuals $\\tilde{Y}$ and $\\tilde{D}$ via a neural network, where the residuals are obtained by cross-validation (cross-fitting). Then, it prints the estimated coefficient β and the clustered standard error from the final OLS regression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.019969,
     "end_time": "2021-02-04T12:17:53.647811",
     "exception": false,
     "start_time": "2021-02-04T12:17:53.627842",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "DML2.for.NN <- function(z, d, y, nfold=2, clu, num_epochs, batch_size) {\n",
    "  nobs <- nrow(z) #number of observations\n",
    "  foldid <- rep.int(1:nfold,times = ceiling(nobs/nfold))[sample.int(nobs)] #define folds indices\n",
    "  I <- split(1:nobs, foldid)  #split observation indices into folds  \n",
    "  ytil <- dtil <- rep(NA, nobs)\n",
    "  cat(\"fold: \")\n",
    "  for(b in 1:length(I)){\n",
    "  # normalize the data\n",
    "  mean <- apply(z[-I[[b]],], 2, mean)\n",
    "  std <- apply(z[-I[[b]],], 2, sd)\n",
    "  z[-I[[b]],] <- scale(z[-I[[b]],], center = mean, scale = std)\n",
    "  z[I[[b]],] <- scale(z[I[[b]],], center = mean, scale = std)\n",
    "  # building the model                    \n",
    "  build_model <- function(){\n",
    "  model <- keras_model_sequential() %>% \n",
    "    layer_dense(units = 16, activation = \"relu\", \n",
    "                input_shape = dim(z[-I[[b]],][2]))%>% \n",
    "    layer_dense(units = 16, activation = \"relu\") %>% \n",
    "    layer_dense(units = 1) \n",
    "  \n",
    "    model %>% compile(\n",
    "    optimizer = \"rmsprop\", \n",
    "    loss = \"mse\", \n",
    "    metrics = c(\"mae\")\n",
    "    )  \n",
    "   }\n",
    "  model.Y <- build_model()\n",
    "  model.D <- build_model()                       \n",
    "  # fitting the model                   \n",
    "  model.D %>% fit(z[-I[[b]],], d[-I[[b]]],\n",
    "                    epochs = num_epochs, batch_size = batch_size, verbose = 0)                       \n",
    "  model.Y %>% fit(z[-I[[b]],], y[-I[[b]]],\n",
    "                    epochs = num_epochs, batch_size = batch_size, verbose = 0)\n",
    "  dhat <- model.D %>% predict(z[I[[b]],])\n",
    "  yhat <- model.Y %>% predict(z[I[[b]],])   \n",
    "  dtil[I[[b]]] <- (d[I[[b]]] - dhat) #record residual for the left-out fold\n",
    "  ytil[I[[b]]] <- (y[I[[b]]] - yhat) #record residial for the left-out fold                  \n",
    "  cat(b,\" \")\n",
    "        }\n",
    "  #rfit <- lm(ytil ~ dtil)    #estimate the main parameter by regressing one residual on the other\n",
    "  data <- data.frame(cbind(ytil, dtil, as.matrix(clu)))\n",
    "  rfit <- felm(ytil ~ dtil|0|0|CountyCode,data=data)\n",
    "  coef.est <- coef(rfit)[2]  #extract coefficient\n",
    "  #HC <- vcovHC(rfit)\n",
    "  se    <- summary(rfit,robust=T)$coefficients[2,2] #record robust standard error by County\n",
    "  cat(sprintf(\"\\ncoef (se) = %g (%g)\\n\", coef.est , se))  #printing output\n",
    "  return( list(coef.est =coef.est , se=se, dtil=dtil, ytil=ytil, rfit=rfit) ) #save output and residuals \n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.006217,
     "end_time": "2021-02-04T12:17:53.660207",
     "exception": false,
     "start_time": "2021-02-04T12:17:53.653990",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Estimating the effect with DLM for neural nets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.183124,
     "end_time": "2021-02-04T12:17:53.849424",
     "exception": false,
     "start_time": "2021-02-04T12:17:53.666300",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Treatment Variable\n",
    "D    <- rdata[which(colnames(rdata) == d)]\n",
    "# Outcome Variable\n",
    "Y     <- rdata[which(colnames(rdata) == y)]\n",
    "# Construct matrix Z\n",
    "Z <- rdata[which(colnames(rdata) %in% c(X1,X2,census))]\n",
    "\n",
    "# inputs\n",
    "y_nn <- as.matrix(Y)\n",
    "d_nn <- as.matrix(D)\n",
    "z_nn <- as.matrix(Z)\n",
    "clu <- rdata[which(colnames(rdata) == \"CountyCode\")]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 154.216019,
     "end_time": "2021-02-04T12:20:28.071875",
     "exception": false,
     "start_time": "2021-02-04T12:17:53.855856",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "#DML with a NN:\n",
    "set.seed(123)\n",
    "DML2.nn = DML2.for.NN(z_nn, d_nn, y_nn, nfold=2, clu, 100, 10)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.6.3"
  },
  "papermill": {
   "default_parameters": {},
   "duration": 184.666009,
   "end_time": "2021-02-04T12:20:29.458861",
   "environment_variables": {},
   "exception": null,
   "input_path": "__notebook__.ipynb",
   "output_path": "__notebook__.ipynb",
   "parameters": {},
   "start_time": "2021-02-04T12:17:24.792852",
   "version": "2.2.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
